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Experimental Self-calibration from Four Views 
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Abstract. The main goal of self-calibration [2, 3, 8] is to compute the 
intrinsic and extrinsic parameters of a camera without using a known 
pattern. In this paper we focus on the calibration of a binocular head-eye 
system from four views. The only information provided to the algorithm 
is the fundamental matrices [3j and the point correspondences between 
the 4 views. We exploit the information of the cross-correspondences to 
improve the Euclidean reconstruction. 

1 Introduction 

In this paper we address the problem of computing the intrinsic and extrinsic 
calibration parameters, in a binocular image sequence, given a set of point cor- 
respondences. Most authors [2, 5, 8] studied the case of point correspondences, 
but have restricted their approach to the case where the intrinsic parameters of 
the camera are constant, while only 2 or 3 views have been taken into account, 
or studied the monocular case for the long sequences as in {7]. 

The generalization to the case where intrinsic parameters are non-constant 
has already been addressed. But usually the analysis is restricted to the recover 
of affine or projective structure of the scene. 

This paper extends these previous works to the case of non-constant in- 
trinsic parameters and non-constant relative positioning of the cameras of the 
stereoscopic system. In particular, in the case of active vision, the extrinsic and 
intrinsic parameters of the visual sensor are modified dynamically. For instance, 
when tuning the zoom and focus of a lens, these parameters are modified and 
must be considered as dynamic parameters. It is thus necessary to attempt to de- 
termine dynamic calibration parameters by a simple observation of an unknown 
stationary scene, when performing a rigid motion. 

2 The Camera Model 

We use the well-known pinhole camera model (see Fig. 1) [3]. We assume that 
there will be a perfect perspective projection with center C (the optical center) 
at a distance f (focal distance) from the retinal plane Tt. The plane containing 
the optical center is called the focal plane. 

Three main coordinate frames are defined: the world coordinate frame, 
the camera frame defined by its origin C and the axis (Xc,Y c ,Z c )-> and the 
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Fig* 1. The pinhole model and our binocular head. 



normalized frame at focal distance equal to 1, which origin is called the prin r 
cipal point c, and axis {X C ^Y C ). 

Using projective coordinates, the 3D world point M = [X, Y, Z] and its retinal 
normalized image m = [xt, v], are related by l - 




P 

where A is the matrix of the intrinsic parameters and D the matrix of the 
extrinsic parameters: 

a- .».(*;) . 

We have chosen to represent the 3x1 translation vector t in spherical coordinates 
(t is denned up to a scale factor and chosen to be unitary), the rotation matrix 
R with the Rodrigues formulation [8] and A depending only on four parameters 
oc u ,oc v , uq and v 0 . Other authors use a fifth parameter 0, which measures the 
non-orthogonality of the pixels, but its value is always less than noise [1]. 

3 The Fundamental Matrix. j 

i 

As already demonstrated by Faugeras in [2] the equation of Longuet-Higgins 
relating the fundamental matrix F and the point correspondences q and qj 
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between two views can be written: 

q 4 Fq = 0, (2) 

with: 



A'-^EA- 1 , (3) 



where E = TR is the essential matrix. T is a skew symmetrical matrix defined 
by the translation vector t such that Tx = t A x for all 3D vector x (A denotes 
the cross-product). Of course, E = A* FA when both cameras are identical. 
For any pair of views we are trying to compute: 3 parameters for the rotation, 
2 for the translation (defined up to a scale factor), and 4 intrinsic parameters. 
Since the essential 3x3 matrix E = TR is of rank two, and because A and 
A' are invertible, F is of rank two. Then, we can compute 7 parameters from 
each fundamental matrix: the motion between the two views and 2 intrinsic 
parameters. The principal point (u 0 >vq) is fixed, and only the factors ct u and a« 
need to be estimated. 



4 Calibrating the Binocular Head-Eye System 

We use a binocular head-eye system (Fig* 1). A description of this active visual 
system can be found in [6]. To calibrate the head-eye we need to know the 
intrinsic parameters of each camera at any position (in this study A t - with i = 
1..4), the displacement of each camera (right D r and left E>j), the displacement 
between both cameras (stereo displacements U and D ), and finally the cross- 
displacements (D°, D*). From Fig. 2 we can easily write: D'Dj = D r D , D° = 
D r I> and D r = D'D*. This means that we can compute all rotation matrices 
and translation vectors from, for instance, R r , t r , R, t, R and t . 




left images right images 

Fig. 2. Our stereoscopic system 
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4.1 The Non-Linear Minimization Algorithm 

The input is the six fundamental matrices relating the four views (or at least 
the first four): F X2 , F 13 , F 24 , F 34 , F u , F 23 , and the correspondences. We denote 
m.y as the j-th correspondence in image i, and between the images i and k. As 
explained in Sect. 3, we obtain from each F an estimation of R and t once the 
intrinsic parameters are known. We compute the output intrinsic and extrin- 
sic parameters replacing in the next criterion each F by the equation 3, and 
minimizing: 

min[]T <(ma)*F„«nS)» + £ ((m$})*F a3 m^) 2 + ]T ((mt^F^ml*) 2 + 

is=1 3=1 *=1 

J2 (("l?)**-*"*?)' + ((m^)*F 14 m» ) 2 + £ ((«S*)*F„n^)*] 

1 m—l n=l 



4.2 The Parameters Computed depending on the Model Used. 

The Model 0, or the simplest model, assumes that D = D' and the intrinsic 
parameters are constant. This is the case when the vergence of the right camera, 
the zoom and the focus are not changed. The principal point is fixed to the center 
of the image: (255, 255) pixels. To reduce the number of intrinsic parameters 
(Model 1), we know [1] that the quotient C 0 = ^ = 0.7 is constant, so we can 
write the intrinsic parameter matrix for each camera position i: 

(Cq.<X v 0 Uq 
0 ot v v 0 
0 0 1 

Using the Model 2, we do not a priori set any parameter as being constant 
If N is the number of views [7], we have 11N-15 independent parameters, which 
is in our case 11*4-15 = 29. In the table 1 we have represented the different 
models. We denote i=l, N as the index of the view. 



■ 



X ' The Euclidean parameters computed depending on the model 



EUCLIDEAN PARAMETERS 


Model 


intrin rotation 


trans 


TOTAL 


N = 4 


Model 0 

u Q ,v 0 fixed 
A f fixed, D = D' 


4 


3JV 
2 


3N t 
2 ~ ± 


3N+3 


15 


Model 1 

^ = 0.7 


3N 


3(N-1) 


3(N-1)-1 


9N-7 


29 


Model 2 


4N 


3(N-1) 


3(N-1)-1 


ION- 7 


33 
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4.3 Euclidean Reconstruction Results Using Real Images 

Two minimization routines have been used: e04fcf() of NAG and frpmrnQ 
of Numerical Recipes (NR). The parameters are initialized to their values by 
default: (uo,v 0 ) are initialized to (255, 255) and are set to (800, 800). 

The extrinsic parameters are computed by developing (3), once the intrinsic 
parameters are set. In the Pig. 3, 4 we show different views of the 3D Euclidean 
reconstructed segments of the scene. 

SEQUENCE 1: Intrinsic and stereo parameters are constant. Between the two 
pairs only the elevation of the stereo frame changes. The correspondences are 
extracted automatically using the grid. 




Fig. 3. From the left to the right: Camera View of the superimposed 3D Euclidean 
reconstruction using model 1 and the library NAG when the parameters are initialized 
to their default values, Top View of 3D Euclidean reconstruction result of our algorithm 
when the parameters are initialized to their default values, initialized by the result of 
"hard calibration" [4], and the result of the "hard calibration" method, respectively. 



SEQUENCE 2: Intrinsic and stereo parameters change between the two views. 
We focused, zoomed and changed the vergence's value. The reconstruction using 
model 0 and 1 is insatisfactory. The parameters are initialized to their default 
values. The correspondences are extracted by hand using the sub-pixel accuracy. 




Fig. 4. Top, Camera and Front View of the 3D E\tclidean reconstruction using the 
model 2 with the library NR (the results with NAG are similar). 
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5 Conclusions 

We want to point out that although we are using a calibration grid for a conve- 
nient feature detection, only sequence 1 uses the 3D model of the grid to increase 
the number of correspondences and to compare the results with a "hard cali- 
bration" method. The other sequence use 50 correspondences. In every case we 
are using a self-calibration method, that is to say, without using the 3D model 
of a known pattern in order to compute the perspective projection matrix. The 
segments are reconstructed automatically. 

The most relevant conclusions are: a) The simplified model 0 is not suitable 
at all when extrinsic or intrinsic parameters change. To set the principal point to 
the center of the image when using a camera with a zoom is not realistic at all. 
b) The model 1 is convenient when we do not use the zoom, then the intrinsic 
parameters a u = 0.7 * a v . It works better than the model 2 because there are 
29 unknowns (the same number of independent parameters). 

The gradient conjugate method's results are worse than Newton's modified 
method results for real images. 

In the case the stereo displacement does not change between the two 
frames, if we initialized the first frame's parameters with the calibration hard 
method, then the results are very good. In the case in which the stereo param- 
eters change, the algorithm minimizes the epipolar distance but increases the 
reconstruction error. 

References 

1. H. Enciso, T. Vieville, and O. Faugeras. Approximation du changement de focale 
et de mise au point par une transformation affine a trois param&tres. Traitement 
du Signal, 11(5), 1994. 

2. O. Faugeras, Q.-T. Luong, and S- Maybank. Camera self- calibration: theory and 
experiments. In 2nd ECCV, pages 321—334, Santa-Margerita, Italy, 1992. 

3. T. Luong. Matrice Fondamentale et Calibration Visuelle sur VEnvironnement. PhD 
thesis, Universite de Paris-Sud, Orsay, 1992. PhD thesis* 

4. L. Robert. Perception Stereoscopique de Courbes et de Surfaces Tridimensionnelles, 
Application a la Robotique Mobile. PhD thesis, Ecole Polytechnique, Palaiseau. 
France, 1992. PhD thesis. 

5. T. Vieville. Autocalibration of visual sensor parameters on a robotic head. Image 
and Vision Computing, 12, 1994. 

6. T. Vieville, E. Clergue, R. Enciso, and H. Mathieu. Experimenting 3d vision on a 
robotic head. In The 12th Int. Conf on Pattern Recognition, pages 739-743, 1994. 

7. T. Vi6ville, Q. Luong, and O. Faugeras. Motion of points and lines in the uncali- 
brated case. International Journal of Computer Vision, 1994. To appear. 

8. Z. Zhang, Q. Luong, and O. Faugeras. Motion of an uncalibrated stereo rig: Self- 
calibration and metric reconstruction. Technical Report 2079, INRIA, 1993. 



SDOCID: <XP 9017406A_I_> 



